The Usefulness of Logical Structure in Flexible Document Categorization
نویسندگان
چکیده
This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more categories (thesis, paper, call for papers, email, ...). Using a set of training documents, our approach generates a set of rules used to categorize new documents. The approach flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is dynamically modified at each new document categorization. The experimentation of the proposed approach provides satisfactory results. Keywords— categorization rule, document categorization, flexible categorization, logical structure.
منابع مشابه
مبانی منطقی طراحی سیستم خطمشیگذاری دولتی برای تحقق عدالت حقمدار (براساس نهجالبلاغه)
This article demonstrates a part of findings of a research that has been designed with the intention of determining the characteristics of the desired public policy making system for achieving social justice. To begin with, James P. Sterba's categorization of alternative political perspectives to justice is reviewed and then "truth – oriented" justice is studied. To reach a precise and scholar...
متن کاملText Type Structure And Logical Document Structure
Most research on automated categorization of documents has concentrated on the assignment of one or many categories to a whole text. However, new applications, e.g. in the area of the Semantic Web, require a richer and more fine-grained annotation of documents, such as detailed thematic information about the parts of a document. Hence we investigate the automatic categorization of text segments...
متن کاملA Flexible Skew-Generalized Normal Distribution
In this paper, we consider a flexible skew-generalized normal distribution. This distribution is denoted by $FSGN(/lambda _1, /lambda _2 /theta)$. It contains the normal, skew-normal (Azzalini, 1985), skew generalized normal (Arellano-Valle et al., 2004) and skew flexible-normal (Gomez et al., 2011) distributions as special cases. Some important properties of this distribution are establi...
متن کاملOil and Iran Regions Rural Economic Structure Alteration
The oil has gradually obtained a predominant place in national economy since 1950 and nowadays, is the main important resource securing country financial needs. Two questions are the base of this research regarding contradiction of oil rent and traditional economic sectors including agriculture and livestock rearing which always have been intensified. These two questions are as follows: what ar...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کامل